Automated Coding of Open-Ended Survey Responses
نویسنده
چکیده
Background: Manual coding of open-ended survey responses is time-consuming, expensive, and error-prone. However, this task can be recast as a multi-label text classification problem, to which natural language processing and machine learning techniques can be applied. Aim: The aim of this work is to assess the suitability of various machine learning techniques to the problem of automated coding of open-ended survey responses, and to explore the use of techniques to automate the search for an appropriate model. Data: The data for this project consist of approximately 28,500 manually-coded responses by 2,000 respondents to open-ended survey questions from the 2008 American National Election Studies survey. Methods: We compare the application of two types of machine learning methods to this task: L1-regularized logistic regression, and recurrent neural networks. In an effort to provide a fair comparison, while minimizing bias in favor of novelty, we make use of Bayesian optimization to perform an automated search over the configuration space of these two types of models, and make use of a reusable holdout in an attempt to prevent overfitting. Results: In the case of a traditional logistic regression model, we find that Bayesian optimization is an effective way to do feature selection and hyperparameter tuning. For recurrent neural networks, which involve a much larger space of possible configurations, we have been unable to produce a consistent improvement over the baseline in performance on this data. Conclusions: Although we do not presently recommend recurrent neural networks as a effective or easy-to-use approach to this kind of limited-data multi-label classification problem, we believe that with further work, recurrent neural networks do have advantages which can be realized through an automated model selection process.
منابع مشابه
Automated Coding of Open-ended Surveys: Technical and Ethical Issues
This paper presents some technical and ethical issues arising from the use of automated methods to solve a typical social science problem: the coding of surveys including answers to open-ended questions. Coding an open-ended survey, which may include thousands of interviews, means to assign symbolic predefined labels to its answers according to their meaning. The increasing amount of informatio...
متن کاملAssessing creative problem-solving with automated text grading
The work aims to improve the assessment of creative problem-solving in science education by employing language technologies and computational–statistical machine learning methods to grade students’ natural language responses automatically. To evaluate constructs like creative problem-solving with validity, open-ended questions that elicit students’ constructed responses are beneficial. But the ...
متن کاملSpouses of Military Members' Experiences and Insights: Qualitative Analysis of Responses to an Open-Ended Question in a Survey of Health and Wellbeing
INTRODUCTION There are few studies on the experiences of spouses of military members, with most focused on adverse impacts of deployment. Responses to an open-ended question in a survey of spouses' health and wellbeing enabled access to perceptions and insights on a broad range of topics. The objective of this investigation was to examine how respondents used the open-ended question and what th...
متن کاملAutomating survey coding by multiclass text categorization techniques
Survey coding is the task of assigning a symbolic code from a predefined set of such codes to the answer given in response to an open-ended question in a questionnaire (aka survey). This task is usually carried out to group respondents according to a predefined scheme based on their answers. Survey coding has several applications, especially in the social sciences, ranging from the simple class...
متن کاملInformation literacy in public libraries from the perspective of public libraries’ policymakers; an exploratory study
Purpose: The present paper aims to conduct an exploratory study on the status of information literacy in upstream documents and curriculums of Iran public libraries institutions for public libraries. Methodology: This is a developmental exploratory-qualitative study in terms of purpose. Research data were collected using in-depth, semi-structured interviews with policymakers and officials of p...
متن کامل